An artificial neural network-based radiomics model for predicting the radiotherapy response of advanced esophageal squamous cell carcinoma patients: a multicenter study

Radiotherapy benefits patients with advanced esophageal squamous cell carcinoma (ESCC) in terms of symptom relief and long-term survival. In contrast, a substantial proportion of ESCC patients have not benefited from radiotherapy. This study aimed to establish and validate an artificial neural network-based radiomics model for the pretreatment prediction of the radiotherapy response of advanced ESCC by using integrated data combined with feasible baseline characteristics of computed tomography. A total of 248 patients with advanced ESCC who underwent baseline CT and received radiotherapy were enrolled in this study and were analyzed by two types of radiomics models, machine learning and deep learning. As a result, the Att. Resnet50 pretrained network model indicated superior performance, with AUCs of 0.876, 0.802 and 0.732 in the training, internal validation, and external validation cohorts, respectively. Similarly, our Att. Resnet50 pretrained network model showed excellent calibration and significant clinical benefit according to the C index and decision curve analysis. Herein, a novel pretreatment radiomics model was established based on deep learning methods and could be used for radiotherapy response prediction in advanced ESCC patients, thus providing reliable evidence for therapeutic decision-making.

www.nature.com/scientificreports/ predictive information for treatment response 11,12 . Lu et al. 13 found that the deep learning-based model showed high accuracy in identifying the origins of cancers of unknown primary. Zhong 14 indicated that multiparametric magnetic resonance imaging (mp-MRI)-based radiomics features could be considered prognostic factors in patients with localized prostate cancer after radiotherapy. Gao 15 showed that radiomics signatures based on longitudinal diffusion-weighted MRIs could be used to estimate radiotherapy effects preoperatively. Zhu 16 reported that a nomogram model based on computed tomography (CT) imaging radiomic signatures and clinical factors showed proper sensitivity and specificity in estimating the risk of local recurrence in nasopharyngeal carcinoma (NPC) after intensity-modulated radiotherapy (IMRT). Previous radiomics studies reported that radiomics features significantly improved the evaluation of the complete pathological response after neoadjuvant chemoradiation in EC patients 17 . However, few relevant studies based on radiomics to predict the response of radiotherapy in ESCC have been reported. Herein, in this study, a large cohort of 248 patients with ESCC was used to develop a novel baseline CT-based radiomics signature model by a deep learning algorithm to validate their performance in predicting response to radiotherapy.

Materials and methods
Patients. Baseline information and imaging data, including demographic data, clinical data, pathological findings of biopsies before treatment, pre-and postsurgical imaging data, and surgical records, of patients with ESCC who underwent radiotherapy at Institution 1 (The First Affiliated Hospital of Xi'an Jiaotong University) from 2013 to 2019 were collected and analyzed. Moreover, we also retrieved and collected the same related data from patients with ESCC who received radiotherapy at Institution 2 (The Second Affiliated Hospital of Xi'an Jiaotong University) from 2017 to 2019. All patients underwent CT examination at the time of positioning before the beginning of their radiotherapy, and the CT data were collected retrospectively from 2021 to 2022. The main inclusion criteria were as follows: (1) biopsy-diagnosed ESCC; (2) clinically diagnosed advanced ESCC by CT and contrast imaging; (3) underwent complete radical radiotherapy (and did not drop out during the treatment); and (4) pre-and postradiotherapy imaging data were recorded after the same institution. The main exclusion criteria were as follows: (1) no biopsy or pathological confirmation; (2) no surgery or more than two weeks of adjuvant chemotherapy before radiotherapy; and (3) no reexamination of imaging data after radiotherapy. The general classifications of all patients were based on the Chinese CSCO guidelines 18 . We used the standard of clinical staging for nonoperative esophageal cancer (Draft) for tumor grading 19 . Written informed consent was obtained from all patients in this study. All methods undertaken in this work were carried out in accordance with the relevant guidelines and regulations. All patients' clinical information was consecutively enrolled, and the Ethics Committee of Xi'an Jiaotong University approved this study.
Twenty percent of the patients from Institution 1 were randomly selected for the internal validation cohort, and the rest of the patients from Institution 1 were grouped into the training cohort. The patients from Institution 2 were chosen as the external validation cohort. The detailed experimental flow is illustrated in Fig. 1.
Radiotherapy treatment and response assessment. All patients in our research accepted the localizing by Brilliance CT locator (Philps, UK) before radiotherapy. The main operating parameters: tube voltage: 140 kV; tube electric current: 500 mA; beam spacing: 0.625 layer thickness; rotation time: 0.5 s; matrix: 512 × 512; detector size: 24 mm. The scanning parameters: layer thickness ≤ 5 mm; interval ≤ 5 mm.
Before the scanning of CT, two radiotherapy technologists and doctors were required to accompany with the patients. All patients were required to fast for 4 to 6 h before CT scanning and drink 0.5 L of water during the scanning to make the esophagus dilated as much as possible. All patients were asked to hold their breath during using the multi-layer spiral CT machine to continuously scan the chest. The scanning range started from the upper edge of the supraclavicular fossa 5 cm above, down to the level of the 1st lumbar vertebra. Then the data of CT scanning was passed to the workstation for reconstruction.
All patients in this study received radical radiotherapy, with a dose of radiation ranging from 60 to 66 Gy at an energy of 6 MV, 1.8-2.0 Gy/fraction, 5 times per week. Organs at risk (OAR), including the bilateral lungs, spinal cord, gastric duct and heart, were outlined for protection. The maximum tolerated doses for key normal structures were as follows: spinal cord: < 40 Gy; heart V40 ≤ 30%; bilateral lungs: V20 ≤ 28% and V30 ≤ 20%; and stomach: V40 ≤ 40%. However, because the data were retrieved retrospectively, the actual dose of radiotherapy for each patient was slightly different. Despite this, each radiotherapy plan fell within the scope of the radical dose recommended by the NCCN guidelines 20 .
Imaging specialists assessed patients' imaging data to measure the tumor's maximum diameter in each plane recommended by RECIST 1.1 guidelines 21 . Then, the maximum diameter shrinkage rate of the tumor could be calculated (preradiotherapy maximum diameter/postradiotherapy maximum diameter), which is popularly used in clinical work to evaluate the treatment response. According to the maximum tumor diameter shrinkage rate for all patients, we selected 0.5 as the potential threshold to divide patients into two categories. After experiments, we chose the optimal cutoff value, which was 0.5 as the threshold to divide patients (tumor reduction rate greater than 50% and tumor reduction rate less than 50%).

Delineation of regions of interest (ROIs).
The continuous planes of CT images in all patients were outlined by two radiologists with more than eight years of experience in radiotherapy target area delineation using Monaco 5.2. Furthermore, another radiologist with more than 15 years of experience checked the target area and obtained the final radiotherapy target area, which guaranteed that the whole tumor of each patient was reflected by the target area from continuous planes of CT images. In this trial, we used the outlined target area images for the construction of a radiotherapy response model. To minimize the influence of the tumor margin on the model, for each patient, we chose the images that best described the tumor site. www.nature.com/scientificreports/ Dataset pre-processing. The patient CT and target delineation data collected in this experiment are stored in DICOM file format, and the CT data in the DICOM files are converted into 2D PNG format images through Python. In this way, we can extract 10 to 60 slices of 2D images outlining the tumor target area from each patient. Afterwards, professional radiologists selects the most representative 2D CT images of the tumor (usually images around the maximum diameter of the tumor). The dataset collected from institution 1 was randomly divided into training set validation sets on a patient basis. The data collected from Institution 2 is distributed to the external validation group. In machine learning experiments, 2D CT images are transformed into feature vectors through feature extraction and subsequent experiments are conducted. In artificial neural networks, we applied data augmentation methods such as rotation, flipping, zooming, and distortion to the CT images. Through data augmentation, the dataset of CT images increased to 6000. And the 2D CT images were the direct input to the artificial neural network models.
Prediction using machine learning. In this study, patients from institution one (training cohort and internal cohort) were used to construct and verify the classification model. Radiomic features of the CT images were extracted using PyRadiomics image extraction software version 3.0. A total of 102 2D features were extracted from each patient, including first-order, shape 2D, gray level cooccurrence matrix (GLCM), gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM), neighboring gray-tone difference matrix (NGTDM), and gray level dependence matrix (GLDM) features 22 . Moreover, we grouped the correlated features (> 0.8) by the Pearson correlation coefficient algorithm, and the less predictive features in the same group were ignored in the feature selection algorithm ( Supplementary Fig. S1). The random forest algorithm was used to decrease the data dimensions and select the most predictive features 23 . The formula and explanation of the feature selection algorithm are as follows: in which X represents the feature, err OOB2 represents the out-of-bag error when we add noise to feature X and err OOB1 represents the out-of-bag error without adding noise. Ten sets of experiments with the number of features ranging from one to ten were applied to determine the optimal number of features for the model. Hence, the five most predictive features were selected to train the classifier and achieved the best performance. Meanwhile,  Figure 1. Flowchart illustrating the steps for developing the radiomic sensitivity prediction model, which includes patient categorization, tumor volume ROI delineation, model selection and output analysis. This flow chart shows the superiority of the neural network model in the end-to-end aspect. The prediction results can be obtained directly without the tedious feature extraction steps of machine learning. In practical applications, the generalization performance of the neural network model is also better. Prediction using deep learning. In this study, patients from institution one were used to build the training cohort and internal validation cohort, and patients from institution two were used to build the external validation cohort to verify the efficiency of generalization. The CNN model has been proven to be very effective in the field of image classification [27][28][29] . End-to-end CNN models provide precise prediction results without additional image feature extractions, which significantly improves the model's efficiency. In this study, multiple neural networks were used to classify the CT images of the patients. The model's learning rate was set to 0.0005, and the root mean square prop (RMSprop) optimizer was used. Meanwhile, we used binary cross entropy as the loss function of the mode; the formula is as follows: in which y i represents the label for each image, p(y i ) represents the probability of the image being positive and q represents the real distribution. In addition, to prevent overfitting problems, we applied a dropout layer and set the dropout rate to 0.2 (randomly ignoring 20% of the neurons). Finally, a sigmoid layer was applied to the model before the output layer to normalize the outputs. The sigmoid function is defined as follows: The batch size was set to 16 and achieved the best performance among other sizes. By training the neural network model, the probability statistics of patients' radiotherapy sensitivity were finally obtained (a probability of more than 0.5 is sensitive, and a probability of less than 0.5 is resistant). The detailed structure of the neural network models is illustrated in Fig. 2a.
Diverse research on pretrained neural networks has shown state-of-the-art performance in image classification tasks 30,31 . Meanwhile, the channel attention mechanism has been proven to be efficient in improving the performance of deep learning models. In this study, a pretrained ResNet50 (trained on ImageNet from Keras) with a channel attention layer was applied [32][33][34] . The channel attention layer was applied to capture the most critical channels from the model's output; the formula of the function is as follows: in which MLP represents the multilayer perceptron and F represents the inputs. The learning rate was set to 0.01, and the Adam optimizer was used. The loss function in these experiments was also binary cross entropy. Meanwhile, the dropout rate of the dropout layer was set to 0.5. After multiple experiments, the batch size was set to 32 and achieved the best performance. The pretrained model outperformed all the other methods we applied in this study. The optimal classifier with the best AUC was used for further exploration. The detailed structure of the neural network models is illustrated in Fig. 2b.
When evaluating the predictive performance of each models, we use the area under the ROC curve (AUC value) to measure the accuracy of each predictive models. However, the area under the ROC curve can not take into account the clinical practicability of the prediction model. Therefore, we use the decision curve to further evaluate each models. The decision curve integrates the preferences of decision makers into the analysis, and can actually evaluate the benefit in clinical practice after using this method. Thus, it meets the actual needs of clinical decision-making and is increasingly widely used in clinical analysis 35 (2)

Results
The baseline clinical characteristics of patients and association analysis with radiotherapy. Our  Machine learning radiomics models using random forest for predicting response to radiotherapy. Four machine learning classifiers were used to construct radiomics models, including linear regression, SVM with linear kernel, SVM with radial basis kernel, and random forest models. The ten most predictive features identified by the random forest algorithm were selected to train the classifier. Compared with the performance outcomes of each classifier, the random forest model showed the highest AUCs in the training and internal validation cohorts, which were 0.767 (95% CI, 0.734-0.790) and 0.594 (95% CI, 0.562-0.631), respectively. The SVM with a linear kernel achieved an AUC of 0.561 (95% CI, 0.530-0.594) in the internal validation cohort, while the SVM with a radial basis kernel model achieved an AUC of 0.539 (95% CI, 0.510-0.564) in the internal cohort. The AUC of the linear regression model in the internal validation cohort was 0.589 (95% CI, 0.561-0.646) (Supplementary Fig. S3a-c). Then, to compare the performance of combined features model with independent feature model, we used the five most predictive features of combined model to train the model individually by random forest. The performance of each independent feature in the radiomics model showed a lower AUC than that of the combined features in the same radiomics model ( Fig. 3a and (Fig. 4a-c). This result indicated that the features of CT were feasible for constructing a reliable prognostic radiomics model. In comparing the performance of the machine learning radiomics models, the CNN radiomics models showed higher AUCs in all three cohorts, which revealed that the deep learning radiomics models, without decreasing data dimensions and removing redundant features, improved the performance of the radiomics model. The process of dimensionality reduction of radiomics features may lead to a lack of perspective information.
To estimate the predictive stability of the radiomics models and evaluate the benefits in clinical applications, decision curves and calibration curve were used to evaluate the performance of two neural network models. The decision curves showed that with the consideration of preferences of decision, using Att. Pretrained model will obtain more benefits than using CNN model in clinical practice (Fig. 4d, e). A calibration curve was applied to the pretrained ResNet50 model in the primary and internal cohorts (Fig. 4f), which showed a high degree of agreement between the predicted and actual results of the model.

Discussion
Radiotherapy is considered one of the most crucial treatments for ESCC patients 36 . The construction of models for predicting the response to radiotherapy is significantly instructive for individualized precision treatment 37 . What makes radiation response prediction difficult in ESCC is the lack of predictive molecular markers of radiation sensitivity 38 . Moreover, the traditional clinical characteristics and general type of ESCC showed limited correlation with the response to radiotherapy. Thus, an available model that can efficiently predict the response to radiotherapy in patients with ESCC needs to be developed. Recently, radiomics signature models based on AI have been applied to different areas and have shown incredible performance in predicting the response to radiotherapy [39][40][41][42][43] . Here, we aimed to design a pretreatment CT-based radiomics model for radiotherapy response prediction in patients with ESCC, which can cover the shortage of predictive molecular markers. Machining learning and deep learning models have been widely used in radiomics research [44][45][46] . In a previous study, traditional machining learning algorithms, such as random forest and SVM, were mentioned more frequently than deep learning algorithms due to the limitation of the population of cohorts 17,47 . Moreover, endto-end algorithms in deep learning have begun to be used in cancer research in recent years, not only using radiographic images but also using histopathological images 48,49 . Screening the reproducibility of features, which is considered an indispensable part of reducing the overfitting of traditional machine learning radiomic models, seems to improve the performance of the radiomics model. In contrast, end-to-end algorithms do not need to reduce the dimensionality and aim to make full use of all image information to draw conclusions 50 . However, the superiority of these two types of algorithms has not been compared to predict radiotherapy response. To determine which of these two algorithms can construct a more effective radiomics model, in our study, both algorithms www.nature.com/scientificreports/ were used to construct radiomics models in the same ESCC patient cohorts. The results implied that whether in the training, internal validation or external validation cohorts, the radiomics model constructed by the end-toend deep learning algorithm showed better performance. This suggests that end-to-end deep learning algorithms should receive more attention in subsequent radiomics studies. Moreover, the CNN model from scratch and the CNN pretrained model were also compared in our studies. Although the use of pretrained neural network models has become an increasingly mainstream choice in recent research, pretrained and models from scratch have rarely been compared in radiomics studies 51 . Our studies showed some evidence regarding pretrained models. Meanwhile, recent studies reported that the channel attention mechanism could significantly improve the performance of neural network models. In our study, the channel attention layer significantly improved the convergence difficulty caused by too many channels in the pretrained model and improved the performance of the model by nearly 3% in the external validation cohort. Recently, advances in radiogenomics have shown that radiomics signatures have distinct correlations with gene expression patterns 37 . The radiomics signatures driven by different pathways involved in immune regulation, tumor proliferation, treatment responses and cellular functions further explain the biological basis of radiomics 52 . This result suggests that we can reflect intratumoral heterogeneity, to a certain extent, by constructing a radiomics model. Compared with traditional clinical characteristics, radiomics features better predict the treatment response to radiotherapy. Therefore, we hypothesize that end-to-end algorithms' overall utilization of CT image information may reflect tumor heterogeneity. The difference in extracting image information may cause an apparent discrepancy between these two different algorithms. However, these theories have not been elucidated by radiogenomics and multiomics studies.
Our retrospective study was limited to temporal discontinuities in the included patients. Although our study minimized differences in imaging data by using a standardizing process, the differences in CT equipment between each period and institution may lead to bias in collecting imaging data. To our knowledge, this study is the first multicenter study of radiomics in nonsurgical ESCC patients. However, the two institutions of our study are located in the same province, and the number of patients is limited. A large patient population of other regions is still needed to evaluate the extrapolation of the model. Finally, our study only analyzed the 2D radiomics phenotype and clinical characteristics due to the limitation of cohorts, and the 3D radiomics phenotype did not show good performance in our model. Therefore, the 3D radiomics phenomenon still needs to be explored in the next step. It is also necessary to combine other omics to further reveal the biological significance of radiomics. This is the first multicenter radiomics study to develop an Att. Resnet50 pretrained network radiomics model in patients with advanced ESCC. It enables clinical decision-making, relying not only on the clinical doctors' experience but also on an objective basis. The effective prediction of radiotherapy provides these patients with reasonable individualized and precise treatment options, as well as timely alternative curative-intent treatment approaches to prevent any unnecessary side effects of radiotherapy and improve the quality of life and survival outcomes of advanced ESCC patients. In addition, our study uses existing routine diagnostic CT imaging, which does not add additional financial burden to patients. At the same time, the Att. Resnet50 pretrained network radiomics model does not require standardized extraction of radiomics signatures, which can be more convenient in clinical use for oncologists to predict the radiotherapy response during diagnosis.

Conclusion
Developing an Att. Resnet50 pretrained network radiomics model for predicting the response to radiotherapy in patients with advanced ESCC can not only help oncologists formulate effective individualized radiotherapy plans promptly and guide clinical decision-making but also complement the lack of molecular markers for predicting radiosensitivity. It is hoped that our study can be included in the radiomics database of ESCC and be considered a baseline study of radiomics in advanced ESCC. The model has the potential to apply to other medical image classification tasks.

Data availability
The data sets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.Code of our full approach are publicly available from Github (https:// github. com/ lqawa keme/ RRAES CC).